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Abstract 

The functional autoregressive model is a Markov model taylored for data of functional 
nature. It revealed fruitful when attempting to model samples of dependent random curves 
and has been widely studied along the past few years. This article aims at completing 
the theoretical study of the model by adressing the crucial issue of weak convergence for 
estimates from the model. The main difficulties stem from an underlying inverse problem as 
well as from dependence between the data. Traditional facts about weak convergence in non 
parametric models appear : the normalizing sequence is not an o (\/n), a bias terms appears. 
Several original features of the functional framework are pointed out. 

Keywords : Functional data, autoregressive model, Hilbert space, weak convergence, ran- 
dom operator, perturbation theory, linear inverse problem, martingale difference arrays. 

1 Introduction 

1.1 The model and its history 

The Functional Autoregressive Model of order 1 (FAR1) generalizes to random elements with 
values in an infinite dimensional space the classical AR(1) model belonging to the celebrated 
class of ARMA process, widely used in time series analysis. This model was introduced by Bosq 
HJ, then studied by several authors. Several chapters in Bosq ^U] are dedicated to a thorough 
study of this strictly stationary process (A" n ) ngZ defined by 

X n - m = p (A n _i - m) + e n> n £ Z, (1) 

where the X^'s and the e^s are random elements with values in an infinite dimensional vector 
space £ , p is an unknown linear operator from £ to £ and m G £ is the expectation of the 
process. In all the following we will assume that for all n e n is independent of X n —\. The 
process (X n ) n( - Z is Markov whenever the e n 's are such that E(e n |A" n _i) = where E denotes 
expectation. 

The model was extended in Mourid [21] considering autoregressive processes of higher orders. 

Besse and Cardot |H| proved that the model is adapted to splines techniques. Then Pumo 

studied autoregressive processes with values in the Banach space of continuous functions on [0, 1] . 

The PhD Thesis by Mas [2U] was partly devoted to the topic. Besse, Cardot and Stephenson [7j 
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developped a method based on kernels. Recently Mas and Menneteau [22] announced large and 
moderate deviations theorems for the process or its covariance sequence whereas Antoniadis and 
Sapatinas 3 implemented wavelet methods which considerably improved the prevision mean 
square error. Even more recently Menneteau [23] proved laws of the iterated logarithm for 
statistics arising from functional PCA of the process. 

The model revealed fruitful in several areas of applied statistics : electrical engineering 
(Cavallini et alii, |12j). climatology (Besse et alii, [7], Antoniadis and Sapatinas medicine 
(Marion and Pumo, 

The main interest of Q relies in its predictive power. Estimating the correlation operator 
only aims at providing an estimate, say p n yielding a predictor for the unknown X n+ i, p n (X n ) 
based on the sample (Xi, ...,X n ). 

However if convergence of p n (X n ) to p(X n ) for instance was often studied either in probability 
or almost surely, the issue of weak convergence has not been truly tackled yet. An attempt was 
proposed in Mas O] but the conditions under which the result holds are extremely restricting. 
The problem of weak convergence is especially intricate due to the functional framework and to 
an underlying inverse problem (see next section). A weak convergence result implies obtaining 
the sharpest rate for convergence in probability. Authors studying rates of convergence for the 
predictor usually just give bounds... Besides a weak convergence result would be of much help in 
getting confidence sets for p(X n ) = E {X n+ \\ X n , A n _i, ...). Maybe a bootstrap procedure could 
be proposed to achieve the same goal but on a one hand I did not find any real and reliable 
bootstrap procedure adapted to this pure functional framework in the literature. On the other 
hand even if a bootstrap approach may be satisfactory on a practical viewpoint, it will just 
provide an approximate distribution. Here the exact asymptotic distribution is given. Besides 
the scope of the paper is rather theoretical. The surprising Theorem 13. II for instance is -to me 
at last- really food for thought for people dealing with functional data. However a promising 
approach would be to compare the results of this paper and those obtained by a bootstrap 
procedure, if any is available. 

One of the other interests of the model is its simplicity. However in the general framework 
mentioned above, a first problem arises : in the case of a general space £, not much is known 
about the mathematical description and properties of the linear space, say £(<?), of bounded 
linear operators from £ to £ . Estimating p requires to build a sequence p n of random linear 
operators in C {£) , and we may face serious troubles if the space C {£) is too complex. 

Usually authors focus on special cases and take for instance £ = C m [0, T] , a Banach space 
of functions defined on [0, T] and with several continuous derivatives (in Mourid, |24j ) or 
£ =W m ' p [0, T] , a space of Sobolev functions on a real interval (see for instance Adams [JJ) 
for definitions and properties of Sobolev spaces). There are practical reasons for these choices. 
Indeed, the curves X n are observed at discretized times and must be first reconstructed by imple- 
menting splines or wavelets for instance. These techniques provide explicit functions belonging 
to the spaces mentioned above. 

Here appears the second problem : studying weak convergence for random elements, such 
as our predictor p n (X n ), in general infinite dimensional spaces is especially difficult, sometimes 
tricky. The most general tool is the Portmanteau Theorem (see Billingsley, jS]) but it is rather a 
general definition than a criterion to check the convergence of measures. Even if we consider the 
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Central Limit Theorem which is a very important but special case of convergence in distribution 
for measures, there are only a few spaces for which sufficient conditions are available (even fewer 
for a necessary condition). We refer to Ledoux and Talagrand ^1 for a review on the CLT 
in Banach spaces. However, if £ is a separable Hilbert space, the situation becomes more 
favourable. Take Zi a sequence of random elements in £. It is a well-known fact that the CLT 
holds for i.i.d. Z\ if and only if the strong second moment is finite (i.e. E ||Zj|| 2 < +00). Besides 
many authors studied the CLT under different sorts of dependence assumptions (m-dependence, 
mixing, martingale differences, etc). We refer to Araujo-Gine JI] for a monograph on the CLT. 
The Hilbertian setting is quite comfortable for several other well-known mathematical reasons : 

• All Hilbert spaces are isometrically isomorphic to the sequence space I 2 , hence have the 
same underlying geometric structure. They appear as the most natural generalization of 
the Euclidean space to the infinite dimensional setting. 

• The bases are denumerable, the paralellogram identity is valid, the projection on convex 
sets is uniquely defined. 

• The operator p belongs to the Banach space of linear operators on a Hilbert space. This 
space is widely used in several areas of mathematics. Spectral decompositions are available 
for compact operators. 

In all the sequel we will set once and for all £ = TC and TC will usually be a space W m ' 2 
where the smoothness index m belongs to N (VF 0,2 = L 2 ). 

The next remark is related to p and also aims at restricting the field of our research in 
order to gain some accuracy in the forthcoming results. In fact the space C (H) is much too 
large : this Banach space is not separable. This could turn out to be a serious problem as far 
as measurability is concerned (remind that we need to define a sequence of estimates p n for p 
taking values in C (TC)). For other reasons mentioned in the next section, we will suppose that p 
is a compact operator. The space /C (TC) of compact operators is separable, its properties are 
closed to those of (finite size) matrices. Many features of linear operators on finite dimensional 
spaces are generalized to /C (TC) in a kind way. 

The space TC is endowed with norm ||-|| , derived from the scalar product (•,•). In the case 
where TC = W m ' 2 we have 




Spaces of continuous operators on TC are endowed with the classical sup-norm defined for all 
bounded operator T by 

ll T lloo = SU P W Tx W 

where TC\ is the unit ball of TC. 

The space of Hilbert-Schmidt operators denoted /C2 (TC) is endowed with norm ||T|| 2 = 
J2 P ll^( e p)l| 2 where e p is any c.o.n.s. in TC. The spaces IC2 (TC) is a subspace of /C (TC). Note 
that up to the author's knowledge, the literature on model Q or its close alternatives in an 
Hilbertian framework assumes that p £ IC2 (TC). Consequently we consider in this article a larger 
class for the unknown parameter. 
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The tensor product notation is of much use. It enables to define finite rank operators. For 

u, v G H, 

(u (g> v) (h) = (u, h) v. 

We may have to deal with another space of operators : the space of trace class operators 
/Ci (TL) C K. (7i) (the H-^ norm on this space will not be fully defined here but I just mention 
that || v\\i = \\u\\ \\v\\). Finally we will sometimes use the following norm bound : 

ll'lloo — II " 1 1 2 — II ' II 1 • 

2 Identification and covariance regularization 

In this Hilbert space setting, Bosq ^U] proved that whenever it exists jo such that l)/^ !!^ < 1 

1 1 2 

and when E ||ei|| is finite, X n is a strictly stationary sequence. For the sake of simplicity and 
in order to alleviate calculations within the proofs we will assume that HpH^ < 1. In the sequel 
we will assume that E (X n ) = m = i.e. we will not adress the problem of estimating the mean 
since this issue was extensively treated in the literature. But we have to face two other serious 
issues. 

2.1 Identifiability 

As the data are of functional nature, the inference on p cannot be based on likelihood. Lebesgue's 
measure does not exist on non locally compact spaces and up to the author's knowledge the 
classical notion of density has not been extended to functional random elements. A classical 
moment method provides the following normal equation : 

A = P T (2) 

where 

r = E(Xi<g)Xi), 

A = E (X 1 ® X 2 ) 

are the covariance operator (resp. the cross covariance operator of order one) of the process 

It is a well known fact that whenever E ^||Ai|| 2 ^ is finite T is a selfadjoint positive, trace 
class operator (hence compact). In other words, T admits the following Schmidt (i.e. spectral) 
decomposition : 

r = ^A / 7r i , ^A*<+oo (3) 

where {Xi) t>1 is the sequence of the positive eigenvalues of T and (Tti)i >1 1S * ne associated 
sequence of projectors. In the sequel the eigenvectors of T are denoted (ei) t>1 hence 717 = ei ® ei 
and if x is any vector of 7i we set x p = (x, e p ) . For further purpose T £ = E (ei (g> e\) will stand 
for the covariance operator of e\. 
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The first step consists in checking that equation (J2J) correctly defines the unknown parameter 



Proposition 2.1 When the inference on p is based on the moment equation (0), identifiability 
holds if and only i/kerT = {0}. 

The proof of the Proposition is simple. Let us give a sketch of it now. Assume that ker V ^ 
{0} and pick v S ker V. Setting p V)U = p + v(&u where u is any vector in Ji it is basic to see that 
A = p v ,vF again. In other words the moment equation may not be able to distinguish between 
p and p v>n . 

Remark 2.1 The condition kerT = {0} implies that all the eigenvalues are strictly positive. In 
the sequel we will assume that \\ > A2 > ••• > 0. 

2.2 Regularizing the inverse covariance operator 

Even if the identifiability of p is ensured by assumption Aq, we must remain cautious when 
building an estimator. Several serious problems appear. 

First it is crucial to note that we cannot deduce from © that Ar _1 = p. Indeed T' 1 
does not necessarily exist. A necessary and sufficient condition for T _1 to be defined as a 
linear mapping is : kerT = {0} . Then T _1 is an unbounded symmetric operator on Tt. The 
consequences are the following : 

• T _1 is just defined on the dense vector space 



• r is a measurable linear mapping but is not continuous, in other words it is continuous at 
no point for which is it defined or "the domain of T _1 is also the set of its discontinuities". 



Ar 1 = p| Imr / p 

The previous facts are very well-known in operator theory and give rise here to an ill-posed 
problem (or an inverse problem). Since T _1 is extremely irregular, we should propose a way 
to regularize it i.e. find out say, a linear operator "close" to T -1 and having additional 
continuity properties. There are several ways to deal with this problem. We refer to Arsenin 
and Tikhonov |S| and Groetsch [T^], amongst many others, for famous books about this topic. 

Here the approach is quite intuitive and classical : when © holds, 




and V (r- 1 ) £ H. 



• IT" 1 



is not the identity operator on Ji but on T> (V 1 ) which entails that (J2J) implies 
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for all x in T> (T 1 ) . We just set 

where (k n ) n£N is an increasing sequence tending to infinity. It may be proved that whenever 
x eV (r _1 ) and n | +00, 

r f (x) -» r^ 1 (a?) . 

Besides 1^ is a continuous operator with ||r^|| = A^ 1 and implicitely depends on n. 

If (j2j) is the starting point in our estimation procedure, replacing the unknown operators by 
their empirical counterparts gives : 

— Pn 

where 

1 n 

r n = — y Xk ® Xk, 

k=l 

j n— 1 

A n = X fc+1 

n — 1 

fe=i 

and pn mp just implicitely defines our estimate for p. 

The preceding remarks give some clues to reach the end of the estimation step. Setting 

4 = E ^ ( 4 ) 

where A/ and tti are the empirical couterparts of A; and 7T/ we get : 

Definition 2.1 T/ie estimate of p is p n given by p n = A n Tn. 

For further purpose we denote Tlk n = Y^=i the projector on the space spanned by the k n 
first eigenvectors of r n . 

Remark 2.2 The A/ 's and the ei 's are obtained as by-products of the functional PCA of the 
sample (Xi, X n ). 

2.3 A smoothness condition on the autocorrelation operator 

In order to get the main results given in the next section we need to develop one of the crucial 
assumptions needed further. This subsection is devoted to explaining it. This condition must 
be understood as a smoothness condition on the unknown operator p. But what do we mean by 
" smoothness" for a linear operator ? The notion of smoothness is intuitively related to functions 
or mapping and should be made more clear in our setting. In order to be more illustrative let 
us consider for p a diagonal operator on TC. Say in any complete othonormal system : 

p = diag Oi)i>i 
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with pn > /ij+i- Obviously if /x, = 1 p = I and if the sequence )Xi is bounded ^(/Uj)i>i e ; z 9 
is a bounded operator. If (pi) i>1 £ Co, p is a compact operator. If (fJ-i) i>1 € / 2 , p is a Hilbert- 
Schmidt operator, etc. The degree of smoothness of p will be strictly determined by the rate of 
decrease to zero of (\pi\) i>1 or, generally speaking of its eigenvalues or characteristic numbers. 
When the /Xj's decrease quickly p is "close" to any finite dimensional approximation based on 
the n first /Vs (when n gets large). Conversely imagine that the |pi|'s tend to infinity, then p is 
unbounded hence not continuous hence not smooth. 
The next assumption 



Ai : 



-1/2 



< +00 



(5) 



tells us that p should be at least as "smooth" as V 1 / 2 . Indeed let us try to be more illustrative 
and assume that p is symmetric and has the same basis of eigenvectors as T. Assumption © 
implies that the sequence (^?./\/Ai) igN is bounded. We set p = T~ l / 2 p. 

Remark 2.3 As a consequence of the above we remark for further purpose that if J) is bounded, 
so is p*. But for the reasons mentioned in the previous subsection p* ^ p*T~ l l 2 . In fact p*Y~ l l 2 
is a bounded operator defined on V (r -1 / 2 ) . Like any bounded operator on a dense domain it 
may be uniquely extended to a bounded operator defined on the whole 7i. This operator precisely 
coincides with p* . I just point out the following : from J3J) we deduce that 



sup 

p 



P *r-V 2 (e p ) 



sup ■ 

v 



\P* ( e P 



< M 



1/5*1 



(6) 



3 Main results 

The main results of this work are collected in two theorems below. We first recapitulate three 
seminal assumptions under the same label : 



ker T = {0} 
E||e|| 2 

IIpIL 



< +00 

< 1. 



The subscript was given on purpose since this set of assumptions is minimal in order to 
begin any statistical inference on the model. 

Then I remind the reader the so-called Karhunen-Loeve (KL) extension of the random ele- 
ment X : the distribution of X (i.e. of X n for all n since the sequence is strictly stationary) is 



(7) 



k=l 



where =d denotes equality of distributions and the £fc's are non correlated real valued random 
variables with null expectation and unit variance (the £fc's are i.i.d. gaussian if X\ is). We will 
make use of (J7J) within the proofs. 

The following moment assumption is mild : 



A 2 : supE^ < M 

k 



(8) 
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It is fullfilled by large families of r.v. £&'s (subject to E£& = and = 1) with thin enough 
queues : gaussian, uniform, two sided exponential, etc, but will fail for certain classes of two 
sided Pareto random variables for instance. Remember that we study weak convergence for 
p n (X n+ i), that p n depends on T n and consequently that assumptions on functionals of the 
fourth moment of X\ (like A2) are unavoidable. 

The next assumption is related to the eigenvalues of V. 

Let Aj = A(j) where A is a positive function defined on and with values in M + . Clearly 
function A is decreasing if the eigenvalues are ordered decreasingly and lim^+oo A (t) = 0. We 
assume that : 

A3 : The function A is convex 

Remark 3.1 Actually we just need A3 to hold for large values of j. This assumption is finally 
not constraining at all since it is suited to many classical cases : when the rate of decay to zero 
is arithmetic (say Aj = Const/ j l+a , a > 0) or exponential ( Aj = Const ■ exp (— aj), a > 0) 
and in several other less standard situations such as Laurent series (Aj = Const/ (j a log 1+ ^j) ; 
a,/3 > 0). 

Remark 3.2 Assumption A3 implies that Aj — A J+ i < Aj-i — Aj. 

The next and first theorem assesses that : 

Theorem 3.1 It is impossible for p n — p to converge in distribution for the norm topology on 
IC. 

Remark 3.3 What is actually proved is : for any normalizing sequence a n j +00, a n (p n — p) 
either diverges or converges in distribution to the Dirac distribution on the null element in K,. 
Also note that weak convergence cannot take place for the Hilbert- Schmidt topology either since 
the embedding from JC2 to K, is continuous. 

For technical reasons, we will focus on a sligthly modified version of the prediction problem. 
We will assume that p n is built from (X\, X n ) and that X n+ 2 is to be predicted from p n 
and X n+ \. In other word the sample is tiled, the last observed curve (X n+ i here) is taken into 
account to predict X n+ 2 but not to construct p n . 

Here is the main result of the paper. Remind that 11/^ was introduced just before Definition 

rm 

/V/ 4 

Theorem 3.2 When assumptions Aq — A3 hold and if ' k n = o 

\ log n 

(fn (X n+ i) - pll kn (X n+ i)J ^ Q 

where Q is a TC-valued gaussian centered random variable with covariance operator T £ . 

Remark 3.4 Theorem VJ.1\ remains unchanged if p is changed into pHk n which appears more 
"natural" in view of Theorem YS.'lA 
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This central result should be commented. First of all the normalizing sequence is typically 



nonparametric : ^ n/k n . Second a bias term appears. Recently, Cardot, Mas and Sarda [TT] 
obtained a similar result in a much simpler regression model, based on i.i.d. observations unlike 
here. A non random bias was obtained -namely the random projector n^, n was replaced by a non 
random one- but this could not be carried out here. Also note that since e is the innovation of pro- 
cess X, the best target we can hope to reach is p (X n+ i) i.e. the conditional expectation of y n +i, 



pli kn (X n+ i) - p(X n+l ) 



tends 



which is random in any case. However it is simple to prove that 

to zero in probability when n tends to infinity. Finally even if the random term pHk n (A n +i) is 
not quite satisfactory on a theoretical viewpoint, it may be easily interpreted by practitioners 
since Uk n (X n+ i) is the projection of the new input onto the k n first axes of the functional PC A 
of the sample. These axes have optimality properties w.r.t. the decomposition of variance for 
the process X. 



4 Concluding remarks 

As seen from the literature on the subject, two modes of stochastic convergence had already 
been investigated for estimates of p in model Q : convergence in probability and almost sure 
convergence. Weak convergence was the missing one essentially because it is more intricate. 
In fact from 

\\p n {X n+ i) - p(X n+1 )\\ < \\p n - pW^ \\x n+1 \\ 

it is plain that convergence (almost sure or in probability) for \\p n — pW^ implies convergence for 
the predictor. Theorem 13.11 proves that the situation is much more different as far as convergence 
in distribution is adressed. 

It should be also stressed that assumptions Ao — A3 are truly mild. For instance all theoretical 
articles dealing with the problem of asymptotics for the predictor assume that p is symmetric 
and that the rate of decay of the sequence of eigenvalues is known. 

The main advance relies undoubtedly on the fact that the dimension sequence k n does not 
depend anymore on the eigenvalues (previously such conditions as n a \k n — ► +00 for some a 
where necessary). The existence of a universal k n enables to revisit all previous results on the 
topic and sheds a new light on this model. Indeed in view of Theorem 13.21 it is tempting to 
postulate that a L 2 minimax rate of convergence could be k n /n when p belongs to the set 
defined by assumption Ai (this set is nothing but an ellipsoid of /C). But these considerations 
are beyond the scope of this article. 



5 Mathematical derivations 

Assumptions Ao — A3 are supposed to hold throughout the proofs. The generic notation M will 
be used to denote universal constants. The next equation is straightforward from (^Q), links T, 
T e and p, and will soon be needed : 

r = P Tp* + r e . (9) 
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We start with letting 

n 

S n = Xf.-i <S> £fc. 
fc=l 

Easy calculations give 

Pn = ^raT^ = pr ra rj l + S n T n , 

p n = P U kn + S n T{. (10) 
It is plain by Q that T n T n = H kn ■ Hence : 

Pn - P u kn = s n frt - rt) + s n rt (11) 



which is the starting point. 

This section is decomposed into three subsections. In the first one preliminary results and tools 
connected with the theory of perturbation for operators on Hilbert spaces are provided. In the 
second part I prove that S n (t{ - Tt) is a vanishing term if the dimension sequence k n is well 
chosen. The third part is devoted to studying weak convergence and proving Theorem 13.21 The 
proof of Theorem 13. II is postponed to the end of the paper. 

5.1 Peliminary results 
5.1.1 Some inequalities 

We first deal with a crucial Lemma. 

Lemma 5.1 We have : 

E((r n -r)(e„),e m ) 2 , , 

supn u J y p!j 1 < M (12) 

m,p A p A m 



, E (S n (e p ) , e m 



2 



/))./, Ay,A„, 



supn ^ T \ v \'^ mi < M (13) 

Proof. We begin with proving ()12|) . 



((r n - T) (e p ) ,e m ) 2 

n 



1 I n Y 



E ((T n - T) (ep) , e m ) 2 = -E ({X u e p ) 2 (X u e m ) 2 

n \ 

2 



+ ~ E ((Xi,e p ) (Xi,e m ) (X k ,e p ) (X k ,e m )) 



n 

l<i<k<n 



It is easily seen by KL decomposition (J7J) and assumption A2 that the first term may be 
bounded by 



^A p A m E {e p e m ) < (14) 



whenever p = m or p 7^ m. 
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Now assume that p / m. We study the second : 



X k = e k + ... + p k -^ 1 (e i+1 ) + p k ~' (X t ) 
X k = E k j + p k ~ % (Xi) 



where 



hence 



Ek,i — £fc + ••• + p % 
E Y {{ X i, e p) { X i, e m) ( x k, e P ) {X k , e m )) 

i<k 

= EJ2 ({Xi, e p ) (Xi, e m ) (p fe - 1 (Xi) , e p ) (X) , e m )) 

i<k 

+ E Y ({Xi, ep) (Xi, e m ) (S fc>i , ep) 

i<fc 

= E Y, ({Xi, ep) (X, e m ) (X) , ep) (/~* (X) , e m )) 
«<fc 

= ({Xi,e p ) (X 1 ,e m ) (p fe - 4 TO ,e p ) TO ,e m )) 

<fe 

E (Xi, e„) TO e m ) J] (p fe -* TO , e p ) (p*" 4 TO , e 

i<fc 
n-1 

TO ep) TO e m ) ^ (n - k) U TO , e p ) U (X 1 ) , e 



E 



k=i 



where (ii) stems from(i) because if p ^ m 



E ((Xi, e p ) (Xi, e m ) (E k>i , e p ) (E k 
= E ((Xi, ep) (X, e m )) E «£ fc)i , e p ) (E k 
= 0. 



and (iii) stems from (ii) by stationarity. Now by (iv), 



E Y ((Xi,e p ) (Xi,e m ) (X k ,e p ) (X k ,e m )) 

l<i<k<n 

n-1 



< E 



k=i 



\(X x ,e p ) (X 1 ,e m )\ Y f 1 " \ ) (P k TO ,ep) (p fc TO ,e 



Let us fix k > 1 and develop 

'/ (X,) , e p ) (p k (X x ) , e m ) = (r^V (Xi) , e p ) (T~ 1 / 2 p k (X,) , e r , 



= ^\Xn(p k - l (X l ), 



P*^P) \ /Jb-l/y N 
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and denoting u p = p* (e p ) /yAp, 

{X{),e p )( P k (Xi),e OT )| < y%X r 



Jfe-i 



||n p || ||u r , 

2 



X 



(16) 



since by Q \\up\\ and ||ii m || may be bounded uniformly wrt p and m by ||ext (/f*)!)^ (see Remark 
HSJbelow) Then 



E ^ (PQ,e p ) (Xj,e m ) {X k ,e p ) (X k ,e m )) 

l<i<k<n 

n—1 , , 

< MTy^E (llXif |<Xi,ep) (X 1>em )|)^ 1-- 

fc=1 V 'V 



And 



E (||Xi|| 2 KXi.ep) (X!,e m )|) = y/A^ ^ A,E fe 2 £p£ m ) 



(17) 



vZ = l 



by © again. Applying twice Cauchy-Schwarz inequality we bound the infinite sum by a constant 
which does not depend on p and m. Collecting l|14j). ((To]) . ifTBl) and ((T7)) we get 



E((r w -r)(e p ),e m ) 2 

n SUp ; ; < M 

p^m 



ApA m 



In order to complete the proof (remember that we assumed that p ^ m just below (|14[)) we can 
check that our computations remain valid if we take p = m. 
The proof of (|13|) is similar but simpler. We have 



E (S n (e p ) , e m / 



^E^(X 1 ,e p ) 2 (e 2 ,e m ) 2 ) 

IE (<X 1; e p ) 2 ) E (< £2j e m > 2 ) = M^.e, 



a p (A m - (piy 

^-m j &m ) ) ^ Ap X Ti 



n 



n 



since T = pTp* + T £ . m 

The proof of the three following Lemmas may be found in Cardot, Mas, Sarda jllj . 
Lemma 5.2 Consider two positive integers j and k large enough and such that k > j. Then 



Besides 



j\j > k\k and Xj — Afc > ( 1 — ^ ] Aj. 



^A,- < (k + l)X k 



(18) 



(19) 
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Lemma 5.3 The following is true for j large enough 



E 



|Aj — Xj 



< Mjlogj. 



5.1.2 A few basic facts about perturbation theory 

Perturbation theory for bounded operators is a powerful tool all along our study and is of much 
help when dealing with random (or not) covariance operators. It features several theoretical 
interests : for instance eigenprojectors or pseudo inverses of T may be expressed as functions 
of r only (without introducing the eigenvectors). However this theory is not widely used in 
statistics although the only mathematical prerequisite is the theory of holomorphic functions 
and of integrals on contours in the complex plane. We refer to Dunford-Schwartz ^3] (Chapter 
VII. 3) or to Gohberg, Goldberg and Kaashoek ^1] for an introduction to functional calculus for 
operators related with Riesz integrals. 

Let us denote by Bi the oriented circle of the complex plane with center Aj and radius Si/ 3 
and define 



The open domain whose boundary is C n is not connected but however we can apply the functional 
calculus for bounded operators (see Dunford-Schwartz ^3], Section VII. 3 Definitions 8 and 9). 
Results from perturbation theory yield : 



where t 2 = —1, IL^ is defined similarly to IL; n (see Theorem 13.2(1 and stands for the projector 
on the space spanned by the k n first eigenvectors of T. The integral is defined on the complex 
plane. Note that the random couterparts (i.e. where IL; n and T are respectively replaced by Hk n 
and r n ) of the previous equation is just : 



and the contour C n is random and depends on the Aj's. The following equalities are also valid 



C n = |J Bi . 



i=l 







and 



r f ) (x n+1 ) 




z 



1 



$>n ( z r n ) 



1 



(X n+ i) dz 



■n 



z^Sniz-ry'ix^dz 



(20) 
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As announced at the beginning of the proof section we will prove in the next subsection that 



(|2T))) -correctly normalized by \Jnjh n - tends to zero in probability, hence is negligible. We need 
two Lemmas to start. In these Lemmas the square root of a symmetric operator T, say T 1 / 2 
appears. The bounded operator T 1 / 2 has the same eigenvectors as T. Its eigenvalues are the 
complex square roots of those of T. 



Lemma 5.4 We have for j large enough 



E sup 



{zi-v)- 1 ' 2 (r n -r) (zi-ry 



-1/2 



E sup 



•v 2 s n (zi-Ty 



-1/2 



E sup 



{zI-T)- 1 / 2 e l 



2 M t i n2 

< — (jlogj) ■ 

oo n 

2 M , 

< — J log J 

n 



< Mjlogj 



(21) 
(22) 
(23) 



In fact this last Lemma was proved in Cardot, Mas, Sarda in an i.i.d framework. However 
a quick inspection of the proof shows that, by Lemma f5.1l the same result holds in this dependent 
setting for (|21jl and (|22j). In order to convince the suspicious reader I give now the derivation 
of (|23j) which uses basically the same technique as for (|21jl and (|22|) but is shorter. We have : 



sup 



{zI-T)- 1 ' 2 e l 



(zi-ry 1 / 2 e 1 



2 = >p (£i,e p ) 
|z — A I 



(ei,e p ) z 



E 



since obvioulsy for all p ^ j, \z — X p \ > \Xj — AJ when z & Bj. Then 



E sup 

2GB, 



(z/-r)- 1 / 2 e 1 



p=i>p^j 



E(ei,ep)- 

Ao — An 



Now from T = T e + /Tp* we see that E (ei, e p ) = (T E e p , e p ) < (Te p , e p ) = X p hence 



E sup 



(zi-ry^e! 



< 



E 

p=i>p^i 



A,- — A r , 



< Mjlogj 



by Lemma 15.31 

This last Lemma will be used when dealing with residual terms S n ^Tn — T^J appearing in 
Lemma 5.5 Denoting 



sup 



{zi-T)- 1 ' 2 (r n -r) (zi-T) 



-1/2 



< 1/2, } , 



The following holds 



sup 



(zl-Vf 2 {zI-V n y 1 {zI-T) 



1/2 



1 £ . < 2, a.s. 
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where M is some positive constant. Besides 



P {Sfj < M 



(24) 



Proof. We have successively 

(zi - rg- 1 = (zi - ry 1 + (zi - ry 1 (r - r n ) (zi - rj- 1 



hence 



(zi - r) 1 / 2 (zi - r^- 1 (zi - t) 1 ' 2 = i + (zi- vy 1 ' 2 (r - r n ) (zi - r n )- x (zi - r 



1/2 



and 



i + (zi - vy 1 ' 2 (r n - r) {zi - vy 1 ' 2 ] (zi - r) 1/2 (zi - r n )- x (zi - r) 1/2 = /. (25) 



It is a well known fact that if the linear operator T satisfies < 1 then I + T is an invertible, 

its inverse is given by formula 



(i + ry 1 = i-t + t 2 



and 



From (l25l) we deduce that 



(I + T)- 1 



< 



1 - T 



< 



(zI-T^izI-TnyyzI-T) 1 / 2 \ e 

oo 

i + [zi - r)- 1/2 (r n - r) (zi - ry 1/2 
1 



-i 



i 



{zi-vy 1 ' 2 (r n -r) (zi-vy 1 / 2 



-1 £ . < 2, a.s. 



Now, the bound in (|24|) stems easily from Markov inequality and 1)211) in Lemma 15.41 This 
finishes the proof of the Lemma. ■ 



5.2 Residual term 

This first lemma only aims at proving that the random contour C n can be replaced by the non 
random one C n in (|2Uj) in order to merge both integrals. 

Lemma 5.6 When -^=k 2 logk n — ► 0, 



n 



s n (rt - rt) (x n+1 ) = / z^Sn \{z - Fny 1 - (z - r)- 1 ! (x n+l ) dz + l 



where y/n \\L n \\ vanishes in probability. 
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Proof. We introduce the following event 



-■4 1. 



Vj e {l, ...,k n } 



< 1/8 y 



and l_4 n is the indicator function of the set A n . 

Introducing the set A n enables to consider the situation when all the ordered eigenvalues of 
T n are close enough to those of T. In fact when A n holds all the k n first empirical eigenvalues 
Xj lie in the circle of center Aj and radius Sj/8, say Bj (included in Bj). Consequently none of 
the Xj is located in the annulus between Bj and Bj and when A n holds C n may be replaced by 
C n . It is clear from previous remarks that 



s n (rt - r+) (x n+1 ) = s n (rt - rt) (x n+1 ) (i An + 1^) 



z Si j 



Z Sn 



(zI-Tn)- 1 -(z-T)-^ (X n+l )dz 
(zi - r n )- x -{z- T)- 1 } (X n+1 ) dz) l A c 



+ s n (rl -rt) (x n+1 )\ A c 



We set 



L n = 5 n (4-rt) (X n+1 )1 A 



( [ Z- l S n \( 



zi - T n y l - {zi - ry 1 ] (x n+1 ) dz ) i A 



S n Tl (X n+1 ) - ( jf z- l S n (zi - Tn)- 1 (X n+1 ) dz^j 



\ A c 



and we see that 



F(^i\\L n \\ 00 >e) <P(1^ >e) =P(^ 
It suffices to get P (A c n ) -> 0. But 



k n 



8=1 



c^)<j>( Ai_Ai >6i/8 



Now we refer to Theorem 4.10 of Bosq jlUj . Following the proof of this Theorem along p. 122 

Aj — Aj is the same as | ((T n — T) ej, ej) \ . 



and 123 it is proved that the asymptotic behaviour of 
Then 

Xi — X l 



Xi — Xj 



> Si/8) < 8^E 

Si 



X; 



,Aj |((r n -r)ej,e. 
Si Xi 



By assumption A2 we get 



E K(r„.-r)e t ,e t )| < l \((T n -r) ei , ei )\ 2 < m 

Aj ~ V X\ ~ y/n 
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M J=> Xi M' kn 



M" 



by (O. At last 

m / . r x \ ■> ^ I 111 \ , ->',>, 

p ^ < g ^ * < ^^-< fc 2 logA , n _ 

This concludes the proof of the lemma. ■ 
For the sake of clarity, from now on we will abusively note 

s n (4 - r-t) (x n+1 ) = f z - x s n [{z - r n y l - ( z - r)- 1 ] (x n+l ) dz 

but Lemma f5.6l above shows that this does not change anything to the validity of our forthcoming 
results. 

The next Proposition is the central result of this subsection. 

1 / n 1 / 4 \ 

Proposition 5.1 If —=k\ (\ogk n ) —* (which is true if k n = o \) we have : 

Jn \logn/ 



^s„(r!,-rt) ( x„ +1 ) 



in H. 



Proof of Proposition 15.11 : 

We develop : 



5 n (rt -rt) (x 



l n+lJ 



Z S n 



(zi - v n y l - (zi - ry 1 ] (X n+1 ) dz 



f z~ l s n (zi - vy l (r - r„) (zi - rg- 1 (x 

f z^SnizI-Ty^T-T^izI-T) 



dz 



,-1/2 



x (zi - T) 1 ' 2 (zi - TnY 1 (zi - T) 1 ' 2 (zi - T)- 1 ' 2 (X n+1 ) dz 



and 



5„(r+-rt) (x n+1 ; 



< 



,-1/2 



z-^Sn (zi - rr 1 / 2 (zi - r)- 1 / 2 (r - r n ) (zi - vy 1 ' 2 



(zI-T) 1 ' 2 (zI-T n y l (zI-T) 



-i 



,1/2 



{zi-vy 1 ' 2 (x n+1 



dz. 
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By Lemma (|5.5|) . 

5 n (rt -rt) (x n+1 ) 
5 n (rt -rt)(x n+1 ) i{n^}+ 5„(4-rt)(x n+1 ; 



< 2 



-1/2 



z-^,, (zi - ry 



-1/2 



(zZ - T)- 1 ' 2 (V - T n ) (zi - V) 



-1/2 



(zi-Ty 1 ' 2 (x 



n+l, 



dz + 



5 n (rt -rn (x n+1 ) 



u J c j 



1 fe fe 2 log k n 
i.e. when — = t,.-™-, (j logjj ~ = — does. 

Let us turn to 1)26(1 . tile it into two terms by decomposing X n+ \ 



(26) 



Obviously -y/n 5 n [ r„ — ) (X n+1 ) decays to zero in probability whenever X)J=i ^ (£ 



^ 2 = g/ 

~T Zb 



-V2 z-^Czz-r)- 1 / 2 
x ll(zi-r)- 1 / 2 (r-r^izi-T)- 1 / 2 

fori 

^l 2 Z^SnizI-T^ 1 ' 2 



(zI-T)- 1 / 2 ( £n+1 ) 



cZz 



xUzi-ry 1 / 2 (T-r n )(zi-vy 



-1/2 



(zi-r)" 1 / 2 /,^) 



and first prove that ^Jn/k n W\ tends in probability to zero. Let us simplifiy this first term. 



Wi < y j sup 



{ z-V*S n (zI-T)-V 2 (zI-T)-^ 2 (e n+1 )\\} 



x sup 

zee,' 



\\\(zi - ry 1 / 2 (r -r n )(zi - ry 1 / 2 } 

III co J 



hence 



EWi < ]T 



1 vW-^TV 



Esupl (zz-rj-^cr-r^^z-r)- 1 / 2 ) 



x./Esup{ z-vs^^j-r)- 1 / 2 ) 2 Esu P { (zz-r)- 1 / 2 ^) ) 



zee, 
n ^ 



i \/l A i ~ 5 3 



8 

(jlogj) 2 



fen 



<-E^i 2 (logi) 2 <-^ /2 (logM 2 
n n 



(i) 
(ii) 
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From (i) to (ii) I invoke Lemma 15.41 



was bounded by \/6j, at last it is plain that 



'jSj is bounded. As a consequence of the above if one chooses k n such that 



\lr-- k n 2 (log Kf = -^=kl (log k n f - 

y K n n yn 



we see that x j — W\ tends in probability to zero. We turn to the second term W 2 and like above 



hrt 



^' =SU P ( Z-^Sn (Zl - V)- 1 ' 2 

x sup {11(3/ - t)- 1 ' 2 (r - r„) (zi - r)- 1 / 2 



(zi-rr^r^pix^W) 



} 



The situation is slightly more complicated than above since S n is not independent from X n . 
We introduce a truncation. Assume that r n is an increasing sequence tending to infinity. 

W 2 = ^ 2 I { || X „||<r„} + W 2 I { \ ]Xn \\>r n } 

= W2 + w 2 + . 



Obviously ^ j -j—W 2 tends in probability to zero since for all e > 



^W+>s)<F(\\X n \\>r n )<^^ 

Kn. / Tn. 



We turn to 



W 2 < \\p\\ T n 



^=E= SUp 

1 V\ X i ~ 6 j\ z&B i 



^Snizl-T)- 1 ' 2 (zI-T)- 1 ' 2 ^/ 2 } 

00 00 J 



x sup {lb/ - t)- 1 ' 2 (r - r n ) {zi - ry 1 / 2 } , 



E sup < 


z- l / 2 S n {zI -T)- 1 ' 2 


2 } 






00 j 



x JE sup 



2GB, 



< 



n ^ 



(zI-T)- 1 ' 2 {T-T n ){zI-T)-y 2 

T n kn 2 (log k n ) 3/2 



j=l V\ X 3~ 5 3 



:f (l0gjf /2 < M 



pn T n k 2 (log k n ) 3 / 2 

hence . - — W 2 tends in probability to zero whenever n n — ► 0. Now we choose 

V k n 



n 



r n = yiog k n with k n as above for W\. This finishes the proof of Proposition l5.il 
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5.3 Weakly convergent term 

As seen from (II 1 f) and from previous subsection S n T^ (X n+ i) will fully determine the asymptotics 
of the predictor : 

n 

fe=l 

n 

= Z k>n . 



fe=l 

We decompose Z k n in three terms 



Zk,n - Z k,n + Z k,n + Z k,n 

Z+ n = (r^X k _ 1 ,e n+1 + p (e n ) + ... + /3 n - fc (£ fe+ i)) e fc 
Z° =/rtx fc _ 1 ,^+ 1 - fc e fc \e fc 



Z- n = (^X k ^,p n+2 - k (X k ^))e k 



stemming from 



X n+1 = e n+l + p (e n ) + ... + p n+1 ~ k (e k ) + p n+2 ~ k (X k ^) 



We will show in Lemma 15.101 below that the series involving Z® n and Z k ~ n are negligible ; 
weak convergence is strictly determined by Y^Ik=i Z kn- ^ ne asymptotic distribution is given at 
Proposition 15.21 below. We begin with an important Lemma. 

Lemma 5.7 The random sequences Zt and Z k ~ n are Hilbert-valued martingale difference ar- 
rays w.r.t. the sequence (Fi)^ where T% is the a-algebra generated by 

Proof : 

Denoting 

X\ n = e n+1 + p (e n ) + ... + p n - k (e k+1 ) , 



e(^J^-i) =E((rtx fc _ 1 ,x{ n ) £fc |^_ 1 ) . 



Since e k is independent from Xl and both sequences of random elements are centered we 
deduce that 

Then 

rtx fc _ 1 , P "+ 2 - fc (x fc _ 1 ))E( £fe |^_ 1 ) 
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Proposition 5.2 

V^n fc=1 

Proof of the Proposition : 

Since Ylk=i Z kn ls a ^-valued martingale difference array we first could hope to apply 
existing criteria for weak convergence of such sequences. Most of these criteria (see Walk 
or Rackauskas |2b| ') rely on convergence in probability for the conditional covariance operator. 
They do not seem to be adapted in this context (I could not go through with it...). I propose 
the reader to come back to the "sources" of the Central Limit Theorem on infinite dimensional 
vector spaces. We will simply prove that S£ is a uniformly tight sequence and that finite 
distributions, when computed on a sufficiently large set of functionals converge to gaussian 
limits, hence characterizing the limiting covariance operator T e . In order to understand this 
approach I refer to the paper by A. de Acosta PQ, especially to Theorem 2.3 p. 279. 

For further purpose we begin with a first Lemma in which covariance and cross-covariance 
operators for the array Z^ n are computed. 

Lemma 5.8 Ifk<i,E (zt n ® Z+\ = and 



k,n k,n 

Proof. 



E (z+ ® Z+) = T E (k n - tr p n ~ k+1 T 



and since = p % k (Xk-i) + £j-i + ••• + p % 1 k (£&) • We tile Z£ n <S> Zf n into two terms. We 
see that 

E [(rtX^XjQ (rt^- fc (AVO , X[ n ) (e k ® e 4 )] = o 
since is independent from all the other terms. The second term is : 

(r^x^xQ (rt ( ef _! + ... + p- 1 -* ( £fc )) , xQ (e k ® £i ) . 

Its expectation is null since X k —i is centered and independent from all the other terms. We 
focus on the second part of the Lemma. 
We have 



E 



= (e^x^xQ 2 ^ r £ 
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and 



e { rtx,_ l5 x{ n ) 2 = e(e (x k ^xiS \x* 



E 
E 



r i/ 2r t x « 



r tl/2 x tt 



fc.ri 



tr ( rtr« 



where 



r £ + /0 r £/3 * + ... + P ' 1 - fc r £ ( P *) n ~ h 



n— fe+l-p /• *\n— fe+1 



r - p n -^r ( P 



Then 



tr rr 



k:u 



tr (rtr) - tr (rt p n - fc+1 r ( P *) n - fe+1 



fc n -tr (rV- fc+1 r { P *) n - k+1 \ . 



The proof of Lemma 15.81 is complete. ■ 

Now we prove that all the finite-dimensional distributions converge to a gaussian limit. It 
suffices to get, for all xinW, 



i n 



(27) 



k=l 



where a\ x = E (e^ , ; 



Since (z£ n , xj is a real valued MDA it suffices to apply the criteria given in Mac Leish |17| . 
In view of Lemma (|5.8|) it is enough to prove that X)fc=i tr ( n ) ~ * na ^ ^ s 



ELi tr (rtr* ) - nk n ZU tr (rV-^r ( P T- k+1 



The usual properties of the trace provide 

tr (rV~ fe+1 r (p*) n - k+1 ^ I = Itr [{p*) n - k+1 T^p n - k+1 T 



< 



< 



oo 

^*)n-fe^ r l/2 r t r l/2~ p n- 
2 



Itrri 



n—k 



p|looltrr| 



I OO II r II OO 
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and we see that whenever nk n — > +00 

I . a \ 2\ 

nk n (28) 



£(E(rtx*_ lf 4 B ) 2 ) 
fc=i ^ ' 



which ensures (|27|). 

Now we turn to the second part of the proof, namely : "the sequence (S'+) ngN is tight". 
Once more we go through a Lemma. 

Lemma 5.9 By V m we denote the projector associated to the m first eigenvectors of the covari- 
ance operator T e of e\ . Then, 

limsupsupP (|| (J - V m ) > e) = 0. (29) 

Remark 5.1 What we prove is "with prescribed probability the sequence is concentrated in 
the e -neighborhood of a finite dimensional space -i.e. Im(V m )". This phenomenon is called flat 
concentration and ensures the tightness o/(S + ) ngN (see de Acosta (1970), Definition 2.1 p. 279). 

Proof of Lemma 15.91 : 



F(\\(I-V m )S+\\ >e) < 

where 



E[\\(I-V m )S+\\ 2 



e 2 



E(||(I-7? m )S+|n =-Ll 



(l^Xfc-i, s n+1 + p (e n ) + ... + p n ~ k (e k+1 )\ (I - V m ) e k 



k=l 



n 



n 



1 ( n 2 \ 

r E || (/ -V m )e k \\ 2 [J2® (r^X^xQ 

n \k=l J 

L tr ( (/ _p m) r e ) (f>(rtx fc _ l5 xj n ) 2 j . 



(ii) 



On line (ii) the expectation of all the cross products is null. I skip through these calculations 
since they are exactly alike thoses carried within Lemma l5. 81 above. The computations made in 
the first part of the proof (see display (|28|)) are useful here. They ensure that 



1 / n 2\ 

^(E^rtAw.x",.) j<M 



sup 

n 

where M is some universal constant. At last letting m tend to infinity we get 

lim ti((I-V m )r £ ) = 

m— >+oo 
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which proves Lemma 15.91 

It remains to conclude. Lemma EU ensures that the centered sequence is tight. By l(2T|) 
we know that the weak limit is gaussian and that its covariance function (hence its covariance 
operator) is fully characterized : the same as E\. We invoke for instance A. de Acosta (1970) to 
conclude the proof of Proposition 15.21 



Lemma 5.10 



i n 

J- \ „ p 



k=l 



k=l 



Proof : 

It is plain that ZZ n is an array of non-correlated random elements. We prove that 



1 



■E 



Z k,; 



k=l 



(30) 
(31) 



E 



k=l 



E\\E 1 \\ 2 J2^(^X k ^,p n+2 - k (X k ^) 



k=l 



k=l 



1/2 



E|H| 2 J>( rt x^v-^p^-HXk-i) 



< E||eif^E 



k=l 



1/2 

r T ) x fc _ 



l?ILE|kiirE 



.1/2 

rt x 1 



\\Xk- 



\Xi\ 



p-l/2^n+2-fc 

Elk +1 " 



fc=i 



1/2 



Since KL expansion yields 

" ; r, j ' * 

we easily see by assumption A2 that 



k n +00 



E 



1/2 



rn Xi 



1x1 



t=i j=i 



0(k n ) 



(32) 



hence (|30|) . 

We turn to obtaining a bound for the second term. With Z® n = (T^X^-i, p n+1 ~ k Ek) £k we 
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get : 



E 



E z °>> 



k=i 



E E KJ 2 + 2 E 

k=l l<i<j<n 
n 2 

J2®(r 1 Xk-i,P n+l - k e k ) \\s k f 



k=l 



+ 2 ]T E(/^X i ^,p n+1 - l e i )(e u e J )(^X j ^,p n+1 -^ 



l<i<j<n 

The first term may be bounded by 



E E 

k=l 



rt 



1/2 



P n+1 ~ k e k 



£ k 



<llplLE(l| ei || 4 )E 

fc=l 



ja—k 



The second term may be rewritten : 



l<i<j<n 

= ^{^X l ^p n+l ~ l e l )(T^p n+1 -^ e {e l ),X j . 1 

l<i<j<n 

= ^ E(rtx J _ 1 , /0 n+1 - 4 ej )(rt /3 ™+ 1 ^r £ (^),/> 7 '~ i ^-i 

l<i<j<n 

l<i<i<n ^ ' 



Taking absolute values we get the bound 



|p|lLE(||ei|| 2 )E 



\X, 



1/2 



Xi 



e\\oo E 

l<i<j'<n 



P n ~ % \\ 

r I 1 oo II r 1 1 oo 



1/ 



-i-l| 



Once again invoking (|32|) we get (|31|) in Lemma I5.1UI 
Proof of Theorem 13.11 : 

From all that was done above it is straightforward to deduce that weak convergence for 

p n — P depends only on the term S n T^ in (|TTj). We recall it : S n T^ = Y^k=i^ Xk-i ® £fc- I 

guess the reader will agree with the following sentences : "Assume that (£fc) fceZ and (X k ) k&z are 

independent sequences of independent random elements in 7i. Then if in this framework S n T^ 

does not converge weakly S n F^ will not converge weakly in the setting of model Obviously 

the situation is much favourable assuming independence "everywhere". 

Let us assume that — S n T^ converges weakly to some random variable Z for some increasing 
n 

sequence a n . We deduce that, for any / e K*, the dual space of /C*, 

n 



k=l 
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converges weakly to / (Z) . In fact JC* = K,\ the space of trace class operators (see Dunford- 
Schwartz ^3] for this classical result), the duality bracket is nothing than the usual trace. 
Consquently we should investigate weak convergence for 



n 



k=l 



T ( I^AVi ® e k 



k=l 



where T is a trace class operator. To prove Theorem 13. 11 it is enough to take T = u®v, u,v G TL. 
Indeed 

n n 

fc=i fc=i 

Now we consider two cases depending on the location of v : 

1. if v e D (r- 1 ) , rt w is a bounded sequence that converges to T 1 v. It is straightforward 

to see that / ( — =iS n rn converges in distribution to / (Z) (which is gaussian) by the real 
VV™ J 

CLT for i.i.d. r.v. This means that necessarily a n = ^Jn. 

2. Let us take a general v ^ D (r" 1 ) , and compute the variance of the series above with 



E 



n 



where u = E (u,Ei) 2 and 



\ n 2 

k=l 
.2 n 



k=l 



-1/2 

r f ] v 



fori I \ 2 



2 2 / f \ 1/2 

Choosing («, e*) = Aj or («, ej) = Aj/3j where $ — > /3 > we see that rn 1 v\\ -> +oo 
and the real valued random variable / ((l/^Jn) S n T'] cannot converge weakly since its 
variance tends to infinity. This shows that the marginals of (a n /n) S n T^ do not all converge 
to the same limiting measure and not all at the same rate, which prevents weak convergence 
in the topology of tC. Hence Theorem 13.11 
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